Lessons in Neural Network Training: Overfitting
نویسندگان
چکیده
For many reasons, neural networks have become very popular AI machine learning models. Two of the most important aspects of machine learning models are how well the model generalizes to unseen data, and how well the model scales with problem complexity. Using a controlled task with known optimal training error, we investigate the convergence of the backpropagation (BP) algorithm. We find that the optimal solution is typically not found. Furthermore, we observe that networks larger than might be expected can result in lower training and generalization error. This result is supported by another real world example. We further investigate the training behavior by analyzing the weights in trained networks (excess degrees of freedom are seen to do little harm and to aid convergence), and contrasting the interpolation characteristics of multi-layer perceptron neural networks (MLPs) and polynomial models (overfitting behavior is very different – the MLP is often biased towards smoother solutions). Finally, we analyze relevant theory outlining the reasons for significant practical differences. These results bring into question common beliefs about neural network training regarding convergence and optimal network size, suggest alternate guidelines for practical use (lower fear of excess degrees of freedom), and help to direct future work (e.g. methods for creation of more parsimonious solutions, importance of the MLP/BP bias and possibly worse performance of “improved” training algorithms). Introduction Neural networks are one of the most popular AI machine learning models, and much has been written about them. A common belief is that the number of parameters in the network should be related to the number of data points and the expressive power of the network. The results in this paper suggest that the characteristics of the training algorithm should also be considered. Generalization and Overfitting Neural networks and other AI machine learning models are prone to “overfitting”. Figure 1 illustrates the concept using polynomial approximation. A training dataset was created which contained 21 points according to the equation
منابع مشابه
Modeling of Compressive Strength of Metakaolin Based Geopolymers by The Use of Artificial Neural Network RESEARCH NOTE)
In order to study the effect of R2O/Al2O3 (where R=Na or K), SiO2/Al2O3, Na2O/K2O and H2O/R2O molar ratios on the compressive strength (CS) of Metakaolin base geopolymers, more than forty data were gathered from literature. To increase the number of data, some experiments were also designed. The resulted data were utilized to train and test the three layer artificial neural network (ANN). Bayes...
متن کاملOptimizing of Iron Bioleaching from a Contaminated Kaolin Clay by the Use of Artificial Neural Network
In this research, the amount of Iron removal by bioleaching of a kaolin sample with high iron impurity with Aspergillus niger was optimized. In order to study the effect of initial pH, sucrose and spore concentration on iron, oxalic acid and citric acid concentration, more than twenty experiments were performed. The resulted data were utilized to train, validate and test the two layer artificia...
متن کامل1997-Lessons in Neural Network Training: Overfitting May be Harder than Expected
For many reasons, neural networks have become very popular AI machine learning models. Two of the most important aspects of machine learning models are how well the model generalizes to unseen data, and how well the model scales with problem complexity. Using a controlled task with known optimal training error, we investigate the convergence of the backpropagation (BP) algorithm. We find that t...
متن کاملLessons in Neural Network Training: Overfitting May be Harder than Expected
For many reasons, neural networks have become very popular AI machine learning models. Two of the most important aspects of machine learning models are how well the model generalizes to unseen data, and how well the model scales with problem complexity. Using a controlled task with known optimal training error, we investigate the convergence of the backpropagation (BP) algorithm. We find that t...
متن کاملEvolutionary Algorithms – Development and Application to Hydrological Variables Forecasting
A comparison of methods to avoid overfitting in neural networks training in the case of catchment runoff modeling. evolutionary computation techniques for noise injected neural network training to estimate longitudinal dispersion coefficients in rivers. Expert Systems with Applications 39, 1354-1361.
متن کامل